Debellor: A Data Mining Platform with Stream Architecture
نویسنده
چکیده
This paper introduces Debellor (www.debellor.org) – an open source extensible data mining platform with stream-based architecture, where all data transfers between elementary algorithms take the form of a stream of samples. Data streaming enables implementation of scalable algorithms, which can efficiently process large volumes of data, exceeding available memory. This is very important for data mining research and applications, since the most challenging data mining tasks involve voluminous data, either produced by a data source or generated at some intermediate stage of a complex data processing network. Advantages of data streaming are illustrated by experiments with clustering time series. The experimental results show that even for moderatesize data sets streaming is indispensable for successful execution of algorithms, otherwise the algorithms run hundreds times slower or just crash due to memory shortage. Stream architecture is particularly useful in such application domains as time series analysis, image recognition or mining data streams. It is also the only efficient architecture for implementation of online algo-
منابع مشابه
Debellor: Open Source Modular Platform for Scalable Data Mining
This paper introduces Debellor (www.debellor.org) – an open source extensible data mining platform with stream-oriented architecture, where all data transfers between elementary algorithms take the form of a stream of samples. Data streaming enables implementation of scalable algorithms, which can efficiently process large volumes of data, exceeding available memory. This is very important for ...
متن کاملSAMOA: a platform for mining big data streams
Social media and user generated content are causing an ever growing data deluge. The rate at which we produce data is growing steadily, thus creating larger and larger streams of continuously evolving data. Online news, micro-blogs, search queries are just a few examples of these continuous streams of user activities. The value of these streams relies in their freshness and relatedness to ongoi...
متن کاملSAMOA: scalable advanced massive online analysis
samoa (Scalable Advanced Massive Online Analysis) is a platform for mining big data streams. It provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several...
متن کاملAn Architecture for Context-Aware Adaptive Data Stream Mining
In resource-constrained devices, adaptation of data stream processing to variations of data rates, availability of resources and environment changes is crucial for consistency and continuity of running applications. Context-aware and resource-aware adaptation, as a new dimension of research in data stream mining, enhances and improves distributed data stream processing tasks. Context-awareness ...
متن کاملSPIN!-An Enterprise Architecture for Spatial Data Mining
The rapidly expanding market for Spatial Data Mining systems and technologies is driven by pressure from the public sector, environmental agencies and industry to provide innovative solutions to a wide range of different problems. The main objective of the described spatial data mining platform is to provide an open, highly extensible, n-tier system architecture based on Java 2 Platform, Enterp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Trans. Rough Sets
دوره 9 شماره
صفحات -
تاریخ انتشار 2008